- 
                Notifications
    You must be signed in to change notification settings 
- Fork 13.4k
Refactor llama-model.cpp #16252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Refactor llama-model.cpp #16252
Conversation
| This is a nightmare to review and rebase until merged though, also you seem to have pushed the same changes to your  | 
| @CISC Ye, need them for working there, but I'll revert once I'm done (unless this is done first and I can merge on top). I know it's a nightmare, I already kind of went through it when I asked an LLM to automate some tasks and it proceeded merrily ripping out methods just because some classes didn't inherit from  If you want, I can write a script that runs tree-sitter on the original definitions in llama-model.cpp vs the new classes and shows any differences to verify that nothing was accidentally lost. | 
| Maybe it would be easier to refactor that partially, just a subset of the models? | 
| This seems to be a good change, just have some other ideas: 
 | 
| All right, I've done the changes suggested by @ngxson - renamed the files to just .cpp and merged all the header files into one models.h. I've also merged all the current changes from master. Could we possibly move this forward somehow? This isn't going to get any easier any time soon... @ggerganov ? | 
| Yes, thanks for the help. I will take a look soon at the Qwen 3 Next PR and also this reorganization here. Likely over the weekend. | 
| The diff is too big so I had a look at some files under your fork instead. It looks good overall, just a nitpick stuff: Seems like many files starts with this pattern: While it's cleaner to be: Should be a real quick regex replace to change | 
| @ngxson aye, done. | 
…st (ggml-org#16742) * Fix CUDA grid launch condition for large block_nums.y * add backend ops test * reduce test repetitions
ggml_vk_create_buffer_temp is not used anywhere, and it is the only caller for ggml_vk_pool_malloc. Signed-off-by: Giuseppe Scrivano <[email protected]>
@ggerganov I know you said you were planning to do it, but honestly it's been a nightmare working on all the model implementations with the huge llama-model.cpp, so I wanted to just get the "easy" albeit tedious part out of the way. Moved all llm_build_* definitions to their separate class files in src/models/